Identification of miRNAs of Strongyloides stercoralis L1 and iL3 larvae isolated from human stool

Strongyloidiasis is a neglected tropical disease caused by the soil-transmitted nematode by Strongyloides stercoralis, that affects approximately 600 million people worldwide. In immunosuppressed individuals disseminated strongyloidiasis can rapidly lead to fatal outcomes. There is no gold standard for diagnosing strongyloidiasis, and infections are frequently misdiagnosed. A better understanding of the molecular biology of this parasite can be useful for example for the discovery of potential new biomarkers. Interestingly, recent evidence showed the presence of small RNAs in Strongyloididae, but no data was provided for S. stercoralis. In this study, we present the first identification of miRNAs of both L1 and iL3 larval stages of S. stercoralis. For our purpose, the aims were: (i) to analyse the miRNome of L1 and iL3 S. stercoralis and to identify potential miRNAs of this nematode, (ii) to obtain the mRNAs profiles in these two larval stages and (iii) to predict potential miRNA target sites in mRNA sequences. Total RNA was isolated from L1 and iL3 collected from the stool of 5 infected individuals. For the miRNAs analysis, we used miRDeep2 software and a pipeline of bio-informatic tools to construct a catalog of a total of 385 sequences. Among these, 53% were common to S. ratti, 19% to S. papillosus, 1% to Caenorhabditis elegans and 44% were novel. Using a differential analysis between the larval stages, we observed 6 suggestive modulated miRNAs (STR-MIR-34A-3P, STR-MIR-8397-3P, STR-MIR-34B-3P and STR-MIR-34C-3P expressed more in iL3, and STR-MIR-7880H-5P and STR-MIR-7880M-5P expressed more in L1). Along with this analysis, we obtained also the mRNAs profiles in the same samples of larvae. Multiple testing found 81 statistically significant mRNAs of the total 1553 obtained (FDR < 0.05; 32 genes expressed more in L1 than iL3; 49 genes expressed more in L3 than iL1). Finally, we found 33 predicted mRNA targets of the modulated miRNAs, providing relevant data for a further validation to better understand the role of these small molecules in the larval stages and their valuein clinical diagnostics.

females) can be distinguished by their morphology and behavior. In particular, L1 have a short buccal canal and their energy consumption is mainly related to feeding and growth, while iL3 have a closed buccal cavity and a thick cuticle that enables them to survive in the external environmental and into the host's gastrointestinal tract 6 .
In the field of transcriptomics, previous studies that characterized the mRNA profiles of S. stercoralis, contributed to understanding the role of transcripts in the evolution of this nematode. The majority of these studies performed mRNA profiling of iL3 and adults, using PV001 and UPD strains isolated from non-human stool [7][8][9][10][11] . For example, Stoltzfus et al. 7 provided PV001 strain transcriptomics data sets of seven developmental stages including L1 and iL3 isolated from gerbil. Ramanathan et al. 12 described DNA microarray for S. stercoralis UPD strain isolated from dog stool to compare iL3 with L1, suggesting a high degree of specificity between these stages. RNAseq data sets of S. stercoralis iL3 from a stool sample of one infected human individual only were published by Marcilla et al. 13 , and Rodpai et al. 14 analysed the effect of dexamethasone on transcriptome profiles of pooled adult worms isolated from stool of infected patients. In the present study we investigated miRNAs of two larval stages L1 and iL3 through RNAseq approach. Small RNAs have been studied extensively in nematodes 15,16 . Using RNAseq, miRNAs were investigated in Strongyloididae including S. ratti and S. papillosus 17,18 , natural parasites of Rattus norvegicus and ruminants, respectively. The close relation of these species to S. stercoralis, suggested that this type of small RNAs might also be expressed in S. stercoralis.
Aims of this work were to investigate for the first time the miRNome of S. stercoralis, and to identify potential miRNAs specific of this nematode. Moreover, the analysis of mRNAs from the same samples, allowed us to predict potential targeting between the identified miRNAs and mRNAs.

Results
Identification of miRNAs and differential expression of mature miRNAs. Predictions of mature and precursor miRNAs were made using all the reads of all the 8 samples (iL3 and L1). We estimated 385 and 208, mature and precursor unique miRNA sequences, respectively (see Supplementary Material 1). Some miRNA mature sequences that were detected in the analyses showed 100% identity with miRNAs of S. ratti (n = 206), of S. papillosus (n = 72) and of Caenorhabditis elegans (n = 4). Novel detected miRNAs did not show any match with either S. ratti, S. papillosus or C. elegans (n = 169). See Fig. 1A. Differential gene expression analysis was only performed on mature miRNAs (n = 294) that presented 20 or more normalized counts from the sum of the counts from all 8 samples (see Supplementary Material 2). Comparing the miRNA expression in L1 and iL3, we observed 6 modulated miRNAs (|log2FoldChange|> 1 and nominal p-value < 0.05); 4 miRNAs were expressed more in iL3 than L1 (STR-MIR-34A-3P, STR-MIR-8397-3P, STR-MIR-34B-3P, STR-MIR-34C-3P); 2 miRNAs expressed more in L1 than iL3 (STR-MIR-7880H-5P, STR-MIR-7880M-5P). However, miRNA expression between L1 and iL3 was not significantly different when correction for multiple testing was applied. Functional annotation of S. stercoralis genes. The available S. stercoralis annotations, which comprise 9290 annotated sequences, still include a significant number of hypothetical proteins or proteins of unknown function (n = 5754). To try and assign a function to these, we functionally annotated the aminoacidic sequences of S. stercoralis proteins (ver. WBPS14) by transferring functional information using precomputed orthologous groups and phylogenies from the eggNOG database. The sequences were classified into 24 eggNOG functional sub-categories; functional description and number of genes belonging to each sub-category are reported in Table 1. Starting from 13,098 proteins, a total of 9986 sequences resulted as functionally annotated when combining both already available WBPS14 annotations and eggNOG annotations. Nonetheless 5058 (38.6%) sequences remain of unknown function, including: 3112 (23.8%) with no annotation and 1946 (14.8%) annotated as "function unknown". Differential gene expression. Comparing the transcripts expression in L1 and iL3, we found 1553 modulated genes (|log2FoldChange|> 1 and nominal-pvalue < 0.05; 802 genes were more expressed in L1 than iL3; 751 genes were more expressed in L3 than iL1) as shown in Fig. 2 (see Supplementary_Material_3 for details and Supplementary Fig. S1). When accounting for multiple testing, the differential expression of only 81 of the 1553 genes was statistically significant (FDR < 0.05; 49 genes more expressed in L1 than iL3; 32 genes more expressed in iL3 than L1). Because L1 and iL3 larvae represent different larval stages of the same worm, statistical tests were performed on paired data obtained from the same individual.
Gene set enrichment analysis. To determine whether S. stercoralis gene sets showed biased expression in L1 and iL3, a total number of 32 gene sets were created: each of the 24 eggNOG sub-category obtained by functional annotation was considered as a gene set; 4 gene-sets were manually generated to include genes known to be differentially expressed from the literature 12 (i.e., a heat-shock protein encoding gene set, a putative antigen www.nature.com/scientificreports/ gene set, a prokaryotes heat-shock protein gene set, and a gene set containing genes encoding products known to be immunoreactive in S. stercoralis-infected humans); and finally, 4 randomly generated gene sets to test the method robustness were included in the analysis. Thirty of the 32 gene sets met the criteria for inclusion into the Gene Set Enrichment Analysis (GSEA), while "Function unknown" and "Cell motility" gene sets were excluded. Of these 30 gene sets, 6 were significantly (p < 0.05) enriched in the iL3 phenotype, namely: cytoskeleton, extracellular structures, immunoreactive genes, inorganic ion transport and metabolism, signal transduction mechanisms, and transcription. Other 13 gene sets were significantly (p < 0.05) enriched in the L1 phenotype: amino acid transport and metabolism; carbohydrate transport and metabolism; cell cycle control, cell division, chromosome partitioning; coenzyme transport and metabolism; energy production and conversion; lipid transport and metabolism; nuclear structure; nucleotide transport and metabolism; post-translational modification, protein turnover, chaperones; replication, recombination and repair; RNA processing and modification; secondary metabolites biosynthesis, transport and catabolism; translation, ribosomal structure and biogenesis. Further details regarding significantly enriched gene sets are reported in Table 2 (see also Supplementary Material 4 and Supplementary Fig. S2A,B). mRNA 3′-UTR and miRNA target prediction. Messenger RNA sequences were analysed to predict three prime untranslated regions (3′-UTR), which are potential targets of miRNA. A total of 6480 3′-UTR were www.nature.com/scientificreports/ predicted and considered as candidate targets for the 6 modulated miRNAs (see previous section). Following miRNAs target analysis, a total of 33 possible targets were identified (Supplementary Material 5) including, transcripts involved in transcription, translation, metabolism and energy production, cytoskeleton rearrangements, signal transduction, and 8 transcripts of unknown function. Six transcripts of the 33 putative targets showed to be differentially expressed between L1 and iL3 (at nominal p-value < 0.05) ( Table 3 and Supplementary Material 5). STR-MIR-34C-3p was reported to putatively target thirteen of the 33 transcripts. Moreover, out of the 33 target transcripts, six had a significant level of differential expression between L1 and iL3 (p-value < 0.05). Among these 6 transcripts, the microtubule-binding protein MIP-T3 and another transcript without annotation (SSTP_0000331600) resulted more expressed in iL3 with FC > 1 and putatively targeted by STR-MIR-34C-3p.

Discussion
In this study, we obtained for the first time the miRNA profile of S. stercoralis L1 and iL3 isolated from human stool. The analysis of transcriptional regulation mechanisms and the differences in miRNome and miRNAs expression profiles between the infective and non-infective larvae in particular, may represent a starting point for an increased understanding of the biological operating mechanism of these molecules. In animals, miRNAs demonstrated to play a role in development and differentiation 19,20 . Small RNAs have also been investigated in nematodes; for instance, miRNAs were demonstrated to be responsible for the developmental switching of C. elegans [21][22][23][24][25] . Moreover, miRNAs have been investigated in Strongyloididae including S. ratti and S. papillosus 17,18 , but data for S. stercoralis were not available yet. First, in our study we used a pipeline of bio-informatics tools in order to classify sequences as miRNAs or not, on the basis of some characteristics considered typical of miRNAs: the mapping of the precursor on specific genomic locations as well as the prediction of the potential miRNA precursor 26 . We collected a catalog of 385 miRNA sequences, of which 44% (169/385) were novel, while others were shared with either the catalog of the miRNAs of S. ratti (53%, 206/385), S. papillosus (19%, 72/385) or C. elegans (1%, 4/385).
After the identification of miRNAs, we investigated their differential expression between L1 and iL3. We identified a total of 6 suggestive modulated miRNAs (nominal p-value < 0.05) common to S. ratti reported in a previous work 17 , but no data are available on the function of these miRNAs. Further investigation is needed to better understand their biological roles and possible association with the developmental switching between larval states. miRNAs demonstrated regulatory roles in modulating the expression of specific mRNAs in many organisms [27][28][29][30] . In line with our aims, using RNAseq we investigated the mRNAs targets of the miRNAs that we identified. We obtained mRNAs profiles that were previously reported to characterize the transcriptome of different stages of S. stercoralis. From the GSEA, L1 appeared to be transcriptionally more active than iL3 12 . In particular, we found up-regulation of metalloproteases in iL3 such as astacin that can be involved in skin penetration [31][32][33] and in the production of antigens, which have been found in sera of people with strongyloidiasis 34 . Moreover, proteins of the astacin family have been identified and extensively investigated in Strongyloididae, and are important to distinguish parasitic from non-parasitic lineages 33,[35][36][37] . Among the mRNAs, we investigated the potential 3′-UTR targets of the 6 suggestive modulated miRNAs. We found 33 predicted target transcripts Table 2. Gene-set enrichment analysis in L1 and iL3. ES enrichment score, NES normalised enrichment score (accounts for differences in gene set size), NOM p-val nominal p-value, FDR q-val false discovery rate q-value. www.nature.com/scientificreports/ involved in transcription, translation, metabolism and energy production, cytoskeleton rearrangements and signal transduction. These 33 transcripts significantly modulated between L1 and iL3. In particular, STR-MIR-34C-3p matched the majority of the transcripts (astacin or metalloproteases were not found). We speculate that a possible regulatory relationship between the differentially expressed miRNAs and their differentially expressed putative targets exists, based on available evidence in literature of orthologs of other species of Strongyloides, and also of C. elegans 38 . For instance, MIP-T3 was over-expressed in iL3 and putatively targeted by STR-MIR-34C-3p. MIP-T3 is a conserved protein across species from worms to humans, and plays a critical role in C. elegans in assembling functional intraflagellar transport complexes in biogenesis of sensory cilia 39 . Comparisons of the C. elegans amphid neuroanatomy to those of free-living and parasitic nematodes such as S. stercoralis, have shown that amphid neurons are broadly conserved 40 . S. stercoralis has evolved a lamellar morphology for its putative thermosensory neuron ALD hypothesizing a mediation of thermotaxis by the skin-penetrating infective larva 41 and with possible functional similarity to AFD neurons in C. elegans 42 . These data suggest a role of STR-MIR-34C-3p in the nervous system patterns of S. stercoralis. Moreover, we found pyruvate kinase (pyk) putatively targeted by STR-MIR-7880M-5p. Pyk is an important enzyme involved in the subpathway of glycolysis and it www.nature.com/scientificreports/ was reported with low expression in parasitic females of S. ratti suggesting their anaerobic energy metabolism 43 . In our samples of S. stercoralis, pyk was expressed less in iL3 than in IL1, thus indicating a possible role of STR-MIR-7880M-5p in enzymatic and metabolic transition of different stages. Overall, our findings of miRNA and putative targets warrant further investigations to better understand the role of miRNAs in S. stercoralis. First, it will be necessary to verify the predicted mRNA targets and the potential effect of miRNAs on their modulation. Moreover, studying the potential interaction between S. stercoralis miRNAs and human target transcripts could provide new insights in the host-pathogen interaction for strongyloidiasis.
Limitations of the study. In presenting these results, we are aware that a larger sample size would be needed to achieve statistically reliable results. However, based on a robust approach, we believe that our findings are relevant for future analyses.

Conclusions
We presented the transcriptome and the first miRNome profiles of both L1 and iL3 larval stages of S. stercoralis isolated from human stool. Overall, our findings provide insights for future 'omics' investigations of S. stercoralis and the development of novel approaches for detection of this parasite. Further studies are required to verify miRNAs that we identified in a larger sample size and to confirm whether miRNAs are good targets for the diagnosis of strongyloidiasis. Moreover, the combination of transcriptomic and proteomic data could allow for better characterization of differential gene expression and critical pathways in the different stages of S. stercoralis.

Methods
Sample collection and isolation of larvae. Faecal samples (n total = 8) were obtained at the IRCCS Sacro Cuore Don Calabria Hospital, Negrar di Valpolicella, Verona (Italy) from a total of five patients from sub-Saharan Africa, infected with S. stercoralis. Precisely, we obtained paired L1 and iL3 only from 3 patients because of high parasitaemia. The isolation of larvae was performed by Baermann funnel technique. L1 larvae (n total = 3) were recovered from freshly deposited stool samples; iL3 larvae (n total = 5) were recovered after seven days from the charcoal coproculture at 26 °C. All L1 and iL3 larvae were harvested and concentrated by centrifugation for 10 min at 1000 g, washed three times in 1 mL of phosphate buffered saline (PBS) pH 7.2 containing antibiotic/antimycotic cocktail (100 U/mL penicillin, 100 μg/mL streptomycin and 0.25 μg/mL amphotericin B). Decontaminated larvae were subsequently stored in Qiazol reagent (Qiagen) at − 80 °C. The flowchart of the present study is reported in Figs. 3

and 4.
Isolation of total RNA from larvae. Total RNA was extracted by thawing at room temperature the samples of L1 (2000 larvae) and iL3 (2000 larvae). Then, a mechanical disruption was performed by adding ceramic beads and using the MagNA Lyser Instrument (Roche). Samples were purified using RNeasy mini kit (Qiagen) following the manufacturer's protocol. The RNA was quantified with Nanodrop Spectrophotometer (Thermo Scientific) and the RNA integrity (RIN > 8) was evaluated using capillary electrophoresis with 2100 Bioanalyzer (Agilent).
S. stercoralis polyadenylated mRNA and miRNA library preparation and sequencing. RNAseq libraries were generated from 1000 ng of total RNA using the TruSeq Stranded mRNA Library Prep Kit according to the manufacturer's protocol (Illumina, Inc., http:// www. illum ina. com). In each sample polyadenylated coding mRNA was isolated using olido-dT beads and reverse transcribed using random primers before generating final stranded libraries. miRNA libraries were prepared from 300 ng of total RNA using QIAseq miRNA Library Kit (Qiagen). Briefly, 3′ and 5′ adapters were ligated to mature miRNAs and used for cDNA synthesis and PCR amplification. The quality of the libraries was assessed using Fragment Analyzer (Agilent).
The average library size (= length of fragments including adapters) was 320 bp and 180 bp for RNAseq and miRNASeq, respectively. The numbers of cycles of amplification for the libraries were 12 and 16 for RNAseq and miRNASeq, respectively. The adapters used for multiplexing were the ones included in the Illumina TruSeq Index Adapters (Index 1-27) kit and QIAGEN QIAseq miRNA NGS 48 Index kit, for RNAseq and miRNASeq, respectively.Libraries were sequenced on the Illumina Nextseq500 system (Illumina) and 75 bp single-end reads were generated. A minimum of 35 million and 10 million fragments per sample were analysed for RNASeq and miRNASeq respectively. Quality control: adapter sequence removal and low-quality sequence trimming. All raw datasets produced by deep sequencing from each library were subjected to further quality control processing, including removal of low-quality reads, 5′-and 3′-adapter contaminants and poly (A) tails. Low quality sequences (quality score threshold Q < 35) at 3′ and 5′ were then trimmed from both ends of the clean reads.
Quality control steps of mRNAseq and miRNAseq removed ~ 1-10% and ~ 6-20% of the reads, respectively. In the case of miRNAseq, many reads were discarded because having a too short insert.
The quality controls steps for miRNA sequencing data were performed using the computer program Cutadapt  www.nature.com/scientificreports/ Transcript assembly and quantification. Reads derived from mRNAs and miRNAs were processed separately. Raw reads (mRNAs and miRNAs) from all samples were aligned to the draft S. stercoralis genome sequence (WormBase ParaSite, version WBPS14) and then assembled into potential transcripts including the 3′ and 5′UTRs.
The S. stercoralis reference genome sequence and annotations were arranged to prepare the specific index files required by the different computer programs used.
Transcript quantification (mRNAs) was performed using a selective alignment with a decoy-aware transcriptome (S. stercoralis reference genome and transcriptome from WormBase ParaSite) in order to mitigate potential spurious mapping of reads that actually arise from some unannotated genomic locus that is sequence-similar to an annotated transcriptome. The quantification step (mRNAs and miRNAs) estimated the expression values of each reference transcript as the relative abundance of units of Transcripts Per Million (TPM) and number of reads mapping to each transcript that was quantified.
The Salmon output (number of reads mapping to each transcript) was then processed by DESeq2 package of R software to identify modulated transcripts/genes. Functional annotation. The emapper-2.0.1 tool 44,45 , that uses precomputed orthologous groups and phylogenies from the eggNOG database to transfer functional information from fine-grained orthologs, was used in order to obtain a functional annotation. The aminoacidic sequences of the S. stercoralis proteins (ver. WBPS14), available at parasite.wormbase.org, were used as an input file. In this way, the genes were classified in 24 eggNOG sub-categories, with the 4 main categories being: information storage and processing, cellular processes and signaling, metabolism, and poorly characterized. miRNA sequencing and quantification. The analysis of miRNA went through multi-steps procedure including a preprocessing of data, detection of known and unknown miRNAs, and quantification and expression profiling. All these steps were done using the different modules of the computer program MiRDeep2 (https:// github. com/ rajew sky-lab/ mirde ep2).
Only cleaned sequences (see "Quality control: adapter sequence removal and low-quality sequence trimming" section) that aligned to the S. stercoralis genome sequences (WormBase ParaSite, version WBPS14) were retrieved for the S. stercoralis miRNA sequences identification.
The miRNA identification process and the following estimation of the expression levels were carried out in two steps: first we identified the already known miRNAs (miRBase release 22.1-the microRNA database; https:// www. mirba se. org/) and then, we identified sequences of miRNA not yet described. For the known miRNAs, as the miRNA database does not contain S. stercoralis miRNAs, we chose the miRNA sequences of the phylogenetically close S. ratti.
Expression levels of each (known or unknown) miRNA were estimated by counting the number of sequences (in each sample) mapping on the miRNA genome region.
In order to assess the number of miRNA sequences shared across S. stercoralis, S. ratti, S. papillosus and C. elegans, we compared mature S. stercoralis miRNA sequences obtained from sequencing with S. ratti and C. elegans miRNAs available in miRbase, and S. papillosus miRNAs available in the literature 18 . The miRNA sequences were considered as shared among different organisms when they showed 100% sequence identity. The Venn diagram reported in Fig. 1A was generated with the VennDiagram R package version 1.7.3. Differential gene expression analyses. Gene Transcripts and miRNAs were analysed separately. The analyses to identify either modulated transcripts or modulated miRNAs were conducted considering the link between a L1 and its corresponding iL3 (paired data) and Log2fold change was calculated as iL3/L1 ratio. Analyses were done using the DESeq2 library of the computer package R (version 4.0.3).
Gene set enrichment analysis (GSEA). The GSEA is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states. GSEA v4.0.3 was used in this study to determine whether S. stercoralis gene sets showed biased expression in L1 and iL3. For this analysis we considered 28 gene sets: 24 gene sets were generated from the 24 eggNOG sub-categories, while 4 gene sets were manually generated to include genes known to be differentially expressed from the literature 12 , i.e., a heat-shock protein encoding gene set, a putative antigen gene set, a prokaryotes heatshock protein gene set, and a gene set containing genes encoding products known to be immunoreactive in S. stercoralis-infected humans. Additionally, four randomly generated gene sets were included in the analysis to test miRNA expression by RT-qPCR. To verify the expression of identified miRNAs by RNAseq in L1 and iL3, we investigated these by RT-qPCR. cDNA synthesis was performed by TaqMan Advanced miRNA Assays (ThermoFisher Scientific) from 10 ng of total RNA, according to manufacturer' instructions. We used customized commercial TaqMan Advanced probes (ThermoFisher Scientific) (sequences are not available as part of intellectual property rights). miRNA expression analysis was made using TaqMan Fast Advanced Master Mix (ThermoFisher Scientific) according to the manufacturer' instructions. The real time amplifications included 20 s at 95 °C, followed by 40 cycles at 95 °C for 30 s, and at 60 °C for 30 s. Thermocycling and signal detection were performed with CFX-96 Touch Real-Time PCR Detection System (Biorad). Negative controls no RT and no template control (NTC) were used. Synthetic spike-in ath-miR159a was used as internal control of reaction. The expression levels were calculated for each sample group after normalization against three reference genes STR-miR-8366A-5p, STR-miR-92-3p and STR-miR-7880L-3p (Supplementary_2) using the NRQ method 48 .
mRNA 3′-UTR and miRNA target prediction. The tool exUTR v0.1.0 49 was used to predict three prime untranslated regions (3′-UTR) starting from StringTie assembled mRNA sequences in fasta format. Briefly, the workflow of the software was the following: transcripts went through an open reading frame (ORF) prediction step, ORFs were then validated by aligning them to Uniprot database (https:// www. unipr ot. org/) using BLASTP, filtered, and self-BLASTed against transcripts sequences to determine the stop codon in transcripts; finally, 3′-UTRs sequences were extracted. Differentially expressed miRNA sequences and predicted 3'-UTR sequences were used as input for PITA version 6 50 in order to perform the miRNA target prediction analysis. Results were then filtered considering seeds of size 7-8 with no mismatches and having ddG ≤ − 10.
Data analysis. Statistical analyses and graphical representations were performed using Graph Pad Prism v.8 software and R Statistical Software v.4.0.3. Multiple one-tail unpaired t-test analysis was used for the differential expression of miRNAs with qRT-PCR. A p-value < 0.05 was statistically significant.
Ethics declarations. The study protocol received ethical clearance from the competent Ethics Committee (Comitato Etico for Clinical Research of Verona and Rovigo Provinces, no. 8634/2019). All included patients signed an informed consent form for the donation of their biological samples for research purposes. All experiments were performed in accordance with relevant guidelines and regulations.

Data availability
All data generated or analysed during this study are included in article and its supplementary information files. Sequences have been submitted to the ENA database with project number PRJEB48672. The assembled data and scripts from this study can be requested from the corresponding author.